-
Notifications
You must be signed in to change notification settings - Fork 7
Utilize Polars.DataFrame for performance in ModelbitComponent #80
Conversation
ykeremy
commented
Oct 4, 2023
•
edited
Loading
edited
- Does this PR have impact on local development experience? If yes, make sure you have a plan and add the documentations to address issues that come with the change
- bump version
- make a release
- publish to pypi service
❤️ I love the implementation. got the question about the composite identifier tho |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🎉
) | ||
df = df.filter(pl.col(feature_name).is_not_null()) | ||
if len(df) > 1: | ||
raise WyvernFeatureValueError( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we add a log line here / reason here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
small comments to address
A DataFrame that contains all the real-time features. | ||
""" | ||
grouped_features = defaultdict(list) | ||
for key, value in real_time_feature_dfs: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be a dict comprehension?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or is n = small (5)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good question.
I don't think we can just do a dict comprehension because we're appending to a list since it's acceptable to have collisions.
However, n isn't necessarily a small number either. I'd say let's just see how this performs first. I believe the ideal upgrade here might be changing how the real time features are calculated.